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Consistency of the mean and the principal components 
of spatially distributed functional data 
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Abstract 



o 

| This paper develops a framework for the estimation of the functional mean and 

If} ' the functional principal components when the functions form a random field. More 

specifically, the data we study consist of curves X(sk\t), t G [0, T], observed at 
spatial points si, S2, . . . , Sjv- We establish conditions for the sample average (in 
space) of the X(sk) to be a consistent estimator of the population mean function, 
and for the usual empirical covariance operator to be a consistent estimator of 
the population covariance operator. These conditions involve an interplay of the 
assumptions on an appropriately defined dependence between the functions X(sk) 
and the assumptions on the spatial distribution of the points s^. The rates of 
convergence may be the same as for iid functional samples, but generally depend 
on the strength of dependence and appropriately quantified distances between the 
^ — , ■ points Sfe. We also formulate conditions for the lack of consistency. The general 

results are specialized to functional spatial models of practical interest. They are 
established using an appropriate quadratic loss function which we can bound by 
^ \ terms that reflect the assumptions on the spatial dependence and the distribution 

of the points. This technique is broadly applicable to all statistics obtained by 
simple averaging of functional data at spatial locations. 

X 

^ ; 1 Introduction 

This paper develops aspects of theory for functional data observed at spatial locations. 
The data consist of curves X(s k ;t), t G [0, T], observed at spatial points Si, s 2 , . . . , Sjy. 
Such data structures are quite common, but often the spatial dependence and the spa- 
tial distribution of the points are not taken into account. A well-known exam- 
ple is the Canadian temperature and precipitation data used as a running example in 
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Ramsay and Silverman (2005) The annual curves are available at 35 locations, some of 



which are quite close, and so the curves look very similar, others are very remote with 
notably different curves. Ramsay and Silverman (2005) use the functional principal com- 
ponents and the functional linear model as exploratory tools. Another example of this 
type is the Australian rainfall data set, recently used by Delaigle and Hall (2010) , which 
consists of daily rainfall measurements from 1840 to 1990 at 191 Australian weather sta- 
tions. Due to the importance of such data structures it is useful to investigate when the 
commonly used techniques designed for iid functional data retain their consistency for 
spatially distributed data, and when they fail. We establish conditions for consistency, or 
lack thereof, for the functional mean and the functional principal components. Our con- 
ditions combine the spatial dependence of the curves X(sk', • ) and the distribution of the 
data locations s&. It is hoped that the general framework we propose will be useful in the 
development of asymptotic arguments for statistical models involving spatial functional 
data. 

An important example of data that fall into our framework are pollution curves: 
X(sk, t) is the concentration of a pollutant at time t at location s^. Data of this type were 
studied by Kaiser et al. (2002) A functional framework might be convenient because such 
data are typically available only at sparsely distributed time points tj which can be differ- 
ent at different locations. Another interesting example are snow water curves measured 
at several dozen locations in every state over many decades. Such data have been studied 
in the spatial framework, e.g. Carroll et al. (1995) and Carroll and Cressie (1996) , but 



useful insights can be gained by studying the whole curves reflecting the temporal dynam- 
ics. In many studies, X(Sk]t) is the count at time t of disease cases, where represents 
an average location in an areal model. 

The data set that most directly motivated this research consists of the curves of the 
so-called F2-layer critical frequency foF2. Three such curves are shown in Figure ITTl In 
principle, foF2 curves are available at close to 200 locations throughout the globe, but 
sufficiently complete data are available at only 30-40 locations which are very unevenly 
spread; for example, there is a dense network of observatories over Europe and practically 
no data over the oceans. The study of this data set has been motivated by the hypothesis 
of Roble and Dickinson (1989) who suggested that the increasing amounts of greenhouse 
gases should lead to global cooling in mesosphere and thermosphere, as opposed to the 
global warming in lower troposphere. Rishbeth (1990) pointed out that such cooling would 
result in a thermal contraction and the global lowering of the ionospheric peak densities, 
which can be computed from the critical frequency foF2. The last twenty years have seen 
very extensive research in this area, see Lastovicka et al. (2008) for a partial overview. 
One of the difficulties is in finding a global trend for curves which appear to exhibit trends 
in opposing directions over various regions. Ulich et al. (2003) stressed that to make 
any trends believable, a suitable statistical modeling, and a proper treatment of "errors 
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FIGURE 1.1 F2-layer critical frequency curves at three locations. Top to bottom (latitude in 
parentheses): Yakutsk (62.0), Yamagawa (31.2), Manila (14.7). The functions must be divided 
by a deterministic function of the latitude to obtain a stationary field. 
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and uncertainties" is called for. Space physics data measured at terrestrial observatories 
always come in the form of temporal curves at fixed spatial locations. Maslova et al. 
(I2009[ I2010a[ I2010b[) used the tools of functional data analysis to study such data, but 
the spatial dependence of the curves was not fully exploited. 

There has not been much research on fundamental properties of spatially distributed 
functional data. Delicado et al. (2010) review recent contributions to the methodology 
for spatially distributed functional data. For geostatistical functional data, several ex- 
ploratory approaches to kriging have been proposed. Typically fixed basis expansions 



are used, see Yamanishi and Tanaka (2003) and Bel et al. (2010) A general theoretical 



framework has to address several problems. The first issue is the dimensionality of the 
index space. While in time series analysis, the process is indexed by an equispaced scalar 
parameter, we need here a <i-dimensional index space. For model building this makes a big 
difference since the dynamics and dependence of the process have to be described in all 
directions, and the typical recurrence equations used in time series cannot be employed. 
The model building is further complicated by the fact that the index space is often con- 
tinuous (geostatistical data). Rather than defining a random field {£(s);s £ M d } via a 
specific model equations, dependence conditions are imposed, in terms of the decay of the 
covariances or using mixing conditions. Another feature peculiar to random field theory 
is the design of the sampling points; the distances between them play a fundamental role. 
Different asymptotics hold in the presense of clusters and for sparsely distributed points. 
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At least three types of point distributions have been considered (Cressie (1993)): When 
the region where the points {s^; 1 < z < N} are sampled remains bounded, then 
we are in the so-called infill domain sampling case. Classical asymptotic results, like the 
law of large numbers or the central limit theorem will usually fail, see Lahiri (1996) The 
other extreme situation is described by the increasing domain sampling. Here a minimum 
separation between the sampling points {s^at} G Rn for all i and N is required. This is 
of course only possible if diam(i?7v) — > oo. We shall also explore the nearly infill situation 
studied by Lahiri (2003) and Park et al. (2009) In this case the domain of the sampling 
region becomes unbounded (diam(i?Ar) — > oo), but at the same time the number of sites 
in any given subregion tends to infinity, i.e. the points become more dense. These issues 



are also studied by Zhang (2004) Loh (2005) , Lahiri and Zhu (2006) Du et al. (2009) 



We formalize these concepts in Sections |3] and HI Finally, the interplay of the geostatisti- 
cal spatial structure and the functional temporal structure must be cast into a workable 
framework. 



For the reasons explained above, the approach of Hormann and Kokoszka (2010) 



who developed a framework for estimation and testing for functional time series is to- 
tally inappropriate for functional spatial fields. The starting point for the theory of 
Hormann and Kokoszka (2010) is the representation Xk = f(ek,£k-i, • • •) °f a function 



Xk in terms of iid error functions While all time series models used in practice ad- 
mit such a representation, no analog representations exist for geostatistical spatial data. 
(Even though not widely used, spatial autoregressive processes have been proposed, but 
no Volterra type expansions have been developed for them.) 

The paper is organized as follows. Section [2] describes in greater detail the objectives 
of this research by developing several examples which show how spatially distributed 
functional data differ from functional random samples and from functional time series. 
In simple settings, it illustrates what kind of consistency or inconsistency results can be 
expected, and what kind of difficulties must be overcome. Assumptions on the functional 
random fields we study are introduced in Section [3J A crucial part of these assumptions 
consists of conditions on the spatial distribution of the points s^. Section H] compares our 
conditions to those typically assumed for scalar spatial processes. In Sections [5] and [6] 
we establish consistency results, respectively, for the functional mean and the covariance 
operator. These sections also contain examples specializing the general results to more 
specific settings. Section [7] explains, by means of general theorems and examples, when 
the sample principal components are not consistent. The proofs of the main results are 
collected in Section [HJ 



4 



2 Motivating examples 



Functional principal components play a fundamental role in functional data analysis, much 
greater than the usual multivariate principal components. This is mostly due to the fact 
that the Karhunen-Loeve expansion allows to represent functional data in a concise way. 
This property has been extensively used and studied in various settings. To name only 



a few illustrative references, we cite 


Yao et al. (2005) Hall and Hosseini-Nasab (2006) 


Reiss and Ogden (2007) , Gabrys and Kokoszka (2007) , Benko et al. (2009) , Paul and Peng (2009) 


Jiang and Wang (2010) and 


Gabrys et al. (2010) Depending on the structure of the data, 



theoretical analyses emphasize various aspects of the estimation process, with smooth- 
ing in iid samples having being particularly carefully studied. This paper focuses on the 
spatial dependence and distribution of the curves, which has received no attention so far. 



Suppose Xi, X 2 , . . . , X N are mean zero identically distributed elements of L 2 = L 2 ([0, 1]) 
such that i?||X|| 4 < oo, where the norm is the usual norm generated by the inner product 
in L 2 . The covariance operator is then defined for x G L 2 by C(x) = E[(X,x) X]. Its 
eigenfunctions are the functional principal components (FPC's), denoted v^. Up to a 
sign, they are estimated by the empirical FPC's (EFPC's), denoted Vk and defined as the 
eigenfunctions of the empirical covariance operator 

1 - 

C N {x) = — ^2 (X n ,x)X n , xeL 2 . 

n=l 

The distance between Vk and Vk is determined by the distance between C and Cm- This 
follows from Lemma 12. 1[ which has been often used. To state it, consider two compact 
operators C and K with singular value decompositions 

oo oo 

(2.1) C(x) = ^2\j (x,Vj) fj, K(x) = J^7j (x,Uj) gj. 

3=1 3=1 

Recall that a linear operator K in a separable Hilbert space H is said to be Hilbert- 
Schmidt, if for some orthonormal basis {e{\ of H 

\\K\\ 2 s :=Y.W K ^)\\h<™- 

i>l 

Then || ■ ||s defines a norm on the space of all operators satisfying this condition. The 
norm is independent of the choice of the basis. This space is again a Hilbert space with 
the inner product 

(K 1 ,K 2 ) s = ^2{K 1 (e i ),K 2 (e i )). 

i>l 

Set 

v j = CjVj, cj = sign((w i , vj)). 
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Lemma 2.1 Suppose C,K are two compact operators with singular value decompositions 
( 12 . 1 p . If C and K are Hilbert- Schmidt, symmetric and positive definite, and the eigen- 
values of C satisfy 

(2.2) Ai > A 2 > ... > X d > X d+1 , 

then 

\\u -v' 3 \\ <*H\\K-C\\s, l<J<d, 

(Xj 

where a± — Ai — A2 and oij = min(Aj_i — Xj, Xj — Xj+i), 2 < j < d. 



Lemma 12.11 can be proven using Corollary 1.6 on p. 99 of Gohberg et al. (1990) and 



following the lines of the proof of Lemma 4.3 of Bosq (2000) 



If the functional observations Xk, fcGZ, are independent, then 
(2.3) limsup NE\\C N - C||| < 00. 

TV -j- 00 

Consequently, for such functional observations, under (12. 2p . 

max EllcuVu — m.|| 2 = O (N^ 1 ) . 

Kk<d v ' 



Hormann and Kokoszka (2010) showed that (12. 3p continues to hold for weakly depen- 
dent time series, in particular for m-dependent Xk- Our first example shows why m- 
dependence does not imply (12. 3p for spatially distributed data. 

Example 2.1 Suppose X^ = X(sk), where s 1; s 2 , . . . , Sjy are points in an arbitrary metric 
space, and the random field X(-) is such that X(s) is independent of X(s') if the distance 
between s and s', d(s,s'), is greater than m. (We continue to assume that the Xk have 
the same distribution.) Set 

B N (m) — {(k,£) : l^M^N and d(s k , s e ) < m} , 

and denote by |I?7v(m)| the count of pairs in BN(m). A brief calculation, which uses the 
Cauchy inequality twice, leads to the bound 

NE\\C N - Cf s < N' 1 \B N (m)\ E\\X(s)\\ 4 . 

If the Sfc are the points in M. d with integer coordinates, then |5jv(m)| is asymptotically 
proportional to mN, implying limsup^^^ A^" 1 |Biv(m)| < 00, and the standard rate 
(12. 3p . But if there are too many pairs in B^(m) this rate will no longer hold. 

Example 12 . II shows that if the points are not equispaced and too densely distributed, 
then the standard rate (12. 3p will no longer hold. The next example shows that in such 
cases the EFPC's may not converge at all. 
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Example 2.2 Consider a functional random field 

oo 

(2.4) X(s;t)=$^» ej -(t), seR d , t G [0, 1], 

i=i 

where {e^, j > 1} is a complete orthonormal system and the £j(s) are mean zero random 
variables with i?[£j(si)£i(s 2 )] = if % ^ j and E[Q(s)^j(s + h)] = XjPj(h),h = ||h||, 
where < 00 ano - eacn Pj(') is a positive correlation function. Direct verification 

shows that C(x) = Y^jLi ( e ji x ) e ji so the Aj are the eigenvalues of C, and the the 
corresponding eigenfunctions. 

Now consider a sequence s n — > 0. Because of the positive dependence, X(s n ) is 
close to X(0), so Cn is close to the random operator X* = (X(0), •) X(0). Observe that 
X*(X(0)) = ||X(0)|| 2 X(0). Thus ||X(0)|| 2 = EJli£j ? (°) is an eigenvalue of X\ Since it 
is random, it cannot be close to any of the Aj. The eigenfunctions of CV are also close to 
random functions in L 2 , and do not converge to the FPC's Cj. 

The intuition presented in this example is formalized in Section 0, where a specific 
numerical example is also given. 



The above example shows that if the points s n are too close to each other, then the 
empirical functional principal components are not consistent estimates of the popula- 
tion principal components. Other examples of the lack of consistency are known, see 



Johnstone and Lu (2009) and references therein. They fall into the "small n large p" 
framework, and the lack of consistency is due to noisy data which are not sparsely rep- 
resented. A solution is to perform the principal component analysis on transformed data 
which admits a sparse representation. The spatial functional data that motivate this re- 
search admit a natural sparse representation, the lack of consistency is due to dependence 
and densely distributed locations of the observations. It is not crucial that the s n be close 
to each other. What matters is the interplay of the spatial distances between these points 
and the strength of dependence between the curves. To illustrate, suppose in Example 
12.21 the covariance between X(s„) and X(0) is 

oo r II || i 

E[(X(s n ),x) (X(0),y)] =X> ex P 1 ^' x > < e i»»>- 

j=0 L pj J 

In a finite sample, small ||s n || have the same effect as large pj, i.e. as stronger dependence. 

These considerations show that it is useful to have general criteria for functional spatial 
data, which combine the spatial distribution of the points and the strength of dependence, 
and which ensure that the functional principal components can be consistently estimated, 
and, consequently, that further statistical inference for spatial functional data can be 
carried out. Such criteria should hold for practically useful models for functional spatial 
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data. The next example discusses such models, with a rigorous formulation presented in 
Section [3j 



Example 2.3 Suppose {ej,j > 1} is an arbitrary fixed orthonormal basis in L 2 . Under 
very mild assumptions, every constant mean functional random field admits the represen- 
tation 



(2.5) 



X(s) = fi + J2^ j (s)e j , 



where the £j(s) are zero mean random variables. In principle, all properties of X, including 
the spatial dependence structure, can be equivalently stated as properties of the family of 
the scalar fields Representation (I2.5p is thus the most natural and convenient model 
for spatially distributed functional data. More generally, the mean \x may itself depend 
on the spatial location s, but in this paper we study spatially stationary random fields. 

Assume that \x = and the field X is strictly stationary (in space); see Section [3] for 
a definition. Suppose we want to predict X(sq) using a linear combination of the curves 
X(si),X(s 2 ), . . . , X(sjv), i.e. we want to minimize 



(2.6) 



E 



N 



N 



X(s ) - ^2a n X(s n ] 



n=l 



N 



= E (X(s ), X(s )) - 2 a nE (X(s n ),X(s )) + £ a k a e E (X(s k ), X{s e )) . 

n=l k,i=l 

Thus for the problem of the least squares linear prediction of a mean zero spatial process 
we need to know only 

(2.7) K(s,s') = E[(X(s),X(s>))}. 

By the orthonormality of the Cj in ( 12. 4(1 . 



E[(X(s),X(s'))} = E 



I \i=i 



i=l 



j=i i=i j=i 

Thus, the functional covariances (12. 7p are fully determined by the covariances 



(2.8) 



#i(s,s') 



^fe(s)^.(s')]. 



Notice that we do not need to know the cross covariances i?[£j(s)£j(s')] for i ^ j. 
Thus, if we are interested in kriging, we can assume that the spatial processes in 
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(12. 4p are independent. Such an assumption simplifies the verification of some fourth order 
properties discussed in the following sections. This observation remains true if the spatial 
field does not have zero mean, i.e. if we observe realizations of Z(s) = //(s) +X(s). A brief 
calculation shows that for kriging, it is enough to know /i(-) and the covariances ( 12.81) . 



Stein (1999) and Cressie (1993) provide rigorous accounts of kriging for scalar spatial 



data. 

Our next example shows how representation ( 12. 51) and the independence of the £j 
allow to derive the standard rate ( 12. 3p . if the points are equispaced on the line and 
the covariances decay exponentially. In the following sections, we construct a theory that 
allows us to obtain the standard and nonstandard rates of consistency in much more 
general settings. We will use the following well-known Lemma. 

LEMMA 2.2 Suppose X and Y are jointly normal mean zero random variables such that 
EX 2 = a 2 , EY 2 = v 2 , E[XY] = pov. Then 

Cov(X 2 ,Y 2 ) = 2pW. 

Example 2.4 Suppose X(s;t) is an arbitrary functional random field observed at loca- 
tions Si, s 2 , . . . , s^. Then 

oo „ „ 

(2.9) NE\\C- Cf s = N~ 1 Y J // Cov(X(s fc ; t)X(s k ; u), X(s f , t)X(s f , u))dtdu. 

k,e=i J J 

Without any further assumptions, a sufficient condition for the EFPC's to be consistent 
with the rate N~ 1 / 2 is that the right-hand side of (12.91) is bounded from above by a 
constant. Under additional assumptions, more precise sufficient conditions are possible. 

Suppose first that representation (12 .4p holds with independent strictly stationary scalar 
fields £;(•)• Define the covariances 

E l£j( s k)£j(se)} = 7j(sfc - s e ), Cov(^ 2 (s fc ),^ 2 (s^)) = Tj(s k - s e ). 

Using ( 12. 9p . we see that under these assumptions, 

oo ( oo 

NE\\C - C\\l = iV" 1 ^2 I J2^ Sk ~ s ^j( s fc - + Y, T ^ Sk ~ S A 

k,e=i I i+j j=i 

Thus holds, if 

N ( oo 2 

(2.10) limsupAT 1 £ I Y^s* -s e )\ < oo 

N ^°° k,e=i [j=i J 
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and 

N oo 

(2.11) limsupiV" 1 E l r ^ Sfc ~ s ^)l < °°- 

Suppose now, in addition, that X is Gaussian with 

(2.12) £fo(Sfc)&(s*)] = ^ 2 exp{-p- 1 C /(s fc ,s,)} 
so that 

(2.13) Cov(e|(s fc ),e|(s^)) = 2a|exp {-2p- 1 c/(s fc , s<)} . 

Suppose the points are equispaced on the line. Denoting the smallest distance between 
the points by d, we see that 



N ( oo ~\ 1 oo N—l 



2 



N ~* E E ^ - s 4 = E + 2iv_1 E^ - m ) E ex p (-pj 1 ^) 

k,t=\ lj=l J j=l m=l [j=l 

If we assume that 

(2.14) < 00 an< ^ suppj < p < oo, 

then Conditions ( 12 . 1 j) and (12. lip hold. Condition (I2.14p means that the correlation 
functions of all processes £■,(■) must decay uniformly sufficiently fast. 
To verify f 1 2 . X 1) . observe that 

JV-l r oo 

iV ~ 1 E^ ~ m M E a i exp (-/°7 lmrf ) 

m=l I, j=l 

N-l f oo "J 2 jV-1 ( oo 

^ E \ E a i ex p (~pj lmd ) \ ^ E ] E ct I ex p 

m=l ^ j=l J m=l ^ j=l 

oo oo AT— 1 / oo \ 2 

= E E *K E ex p (-2p" 1 h = o(i) x; *j = o(i). 

j=l i=l m=l \i=l / 

The verification of ( 12.1 ip is analogous because ( 12.14j) implies a j < 00 • 

We will see that Condition ( I2.14p (formulated analogously for several classes of models) 
is applicable in much more general settings than equispaced points on the line. 
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3 Models and Assumptions 



We assume {X(s),s G 7BL d } is a random field taking values in L 2 = L 2 ([0, 1]), i.e. each 
X(s) is a square integrable function defined on [0, 1]. The value of this function at t G [0, 1] 
is denoted by X(s; t). With the usual inner product in L 2 , the norm of X(s) is 

||X(s)|| = ||x 2 (s;t)dt} 1/2 . 

We assume that the spatial process {X(s),s G M d } is strictly stationary, i.e. for every 
h G R d , 

(3.1) (X(si), X(s 2 ), . . . , X(s k )) = (X(si + h), X(s 2 + h), . . . , X(s k + h)) . 
We also assume that it is square integrable in the sense that 

(3.2) E\\X(s)\\ 2 < oo. 

Under (13.11) and (13. 2p . the common mean function is denoted by fi = EX(s). 

The first question is how we can assure the existence of stationary spatial functional 
models. A most direct and convenient way is to directly construct them by using (12.51) . 
As {ej} is a basis, every a priori given functional field X admits expansion (12. 5p . Since 
£j(s) = (X(s) — fi , ej), the functional field X is strictly stationary if and only if each 
scalar field ^ is strictly stationary. By Parseval's identity ||AT(s) — n\\ 2 = J2j>i£j( s )i so 
fl3T2|) holds if and only if J2j>i E C]{s) < oo. 

The cross-covariance operators are defined by 

C SuS2 (x) := E [(X( Sl ) - fi,x)(X(s 2 ) - fi)] 

(3.3) =53^£?fe(s 1 )e fc (s 2 )](e i ,z)e fc . 

In particular, the covariance operator C is defined by 

(3.4) C(x) = C s>s (x) = E [(X(s) - fi, x)(X(s) - //)] . 

If a process has the representation (12.51) with uncorrelated random fields £-,(•); i- e - 

E[^ i (s 1 )^ j (s 2 )] = for all si, s 2 if i ^ j, 

then the ej are the eigenfunctions of C and the Xj = E£ 2 (s) are the corresponding 
eigenvalues. 

To develop an estimation framework, we impose conditions on the decay of the cross- 
covariances £'[(X(s 1 ) — /i, X(s 2 ) — /i)], as the distance between Si and s 2 increases. We 
shall use the distance function defined by the Euclidian norm in R d , denoted ||s! — s 2 || 2 , 
but other distance functions can be used as well. 
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Assumption 3.1 The spatial process {X(s),s G M. d } is strictly stationary and square 
integrable, i.e. ( 13. ip and (13.21) /ioW. In addition, 



(3.5) 



|S(X(si) - n, X(s 2 ) -fj)\< h(\\s x - s 2 || 2 ), 



where h : [0, oo) — > [0, oo) with h(x) \ 0, as x — >■ oo. 

If the process {X(s),s G M d } has representation ( 12.51) with some basis {e^}, then it 
can be easily seen from (13. 3p that (13. 5p is equivalent to 



(3.6) 

Notice also the relation 



3>1 



< /l(||Si - S 2 || 2 ). 



(CW^),^) =E[^(si)0(s 2 ) 
which follows also from (13. 3p . If we assume more specifically that 

(3.7) £fo(si)&(s2)] = ^(|| Sl -s 2 || 2 ), 

then (I3.5P is equivalent to 



(3.8) 



E^ 1 



S2II2) 



< h(\\sx - S2II2). 



Examples 13.11 and 13.21 consider typical spatial covariance functions, and show when 
condition (13. 8 p holds with a function h as in Assumption 13.11 

Example 3.1 Suppose that the fields {£j(s), s e M. d }, j > 1, are zero mean, strictly 
stationary and a-mixing. That is 

sup \P{A)P(B)-P(Af]B)\<a j (h), 

(A,B)e<r(fc(s))x<7(6(H-h)) 

with Oj(h) if ||h|| 2 — >■ 00. Let a'-(/i) = sup{a.,(h) : ||h|| 2 = h}. Then a*{h) = 
sup{a' (x) : x > /i} \ as h — > 00. Using stationarity and the main result in Rio (1993) 
it follows that 



|£&(si)£,-(s 2 



< 2 

< 2 



^•(0)^(82-8! 

2aj(s2— si) 

2«*(||s2— si H2) 



Q 2 Au)du 







Q 2 (u)du 



S 2 - Si 2 
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where Qj(u) = inf{t : -P(|£j(0)| > t) < u] is the quantile function of |£j(0)|. Note that 
a h (h) < 1/4 for any h, and thus <f) 3 (x) < 2 Q){u)du = 2E[§{0)\. If E,>i E^(0) < oo, 
then holds with h(x) = J2j>i <Pj( x )- ( Note that 1^0*0 1 \ follows from a* (a;) \ 
and the monotone convergence theorem.) 

Example 3.2 Suppose ( 13. 7ft holds, and set /i(x) = Ylj>i < f > j( x )- If eacn <A? * s a powered 
exponential covariance function defined by 



X v 

x x 



= (Tj exp 

then /i satisfies the conditions of Assumption 13.11 if 

(3.9) < 00 anc ^ sup pj < oo. 



Condition (13. 9p is also sufficient if all (pj are in the Matern class, see Stein (1999), with 
the same u, i.e. 

(j)j{x) = a 2 j x v K v (x/p j ) 1 

because the modified Bessel function K v decays monotonically and approximately ex- 
ponentially fast; numerical calculations show that K v (s) practically vanishes if s > v. 
Condition (13.91) is clearly sufficient for spherical <fij defined (for d = 3) by 



X > p. j 



because 4>j is decreasing on [0,pj] 



Assumption 13 . 1 1 is appropriate when studying estimation of the mean function. For the 
estimation of the covariance operator, we need to impose a different assumption. Recall 
that if z and y are elements in some Hilbert space H with norm || • \\h, the operator z®y, 
is defined by z <S> y(x) = {z,x)y. In the following assumption, we suppose that the mean 
of the functional field is zero. This is justified by notational convenience and because we 
deal with the consistent estimation of the mean function separately. 

Assumption 3.2 The spatial process {X(s), s e M. d } is strictly stationary with zero mean 
and with 4 moments, i.e. E(X(s),x) = 0, Vx G L 2 , and E\\X(s)\\ A < oo. In addition, 

(3.10) \E(X( Sl ) ® X( Sl ) - C , X(s 2 ) ® X(s 2 ) - C) s \ < H(\\ Sl - s 2 || 2 ), 

where H : [0, oo) — > [0, oo) with H{x) \ 0, as x — >■ oo. 
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Assumption 13.21 cannot be verified using only conditions on the covariances of the 
scalar fields £j in (12.51) because these covariances do not specify the 4th order structure 
of the model. This can be done if the random field is Gaussian, as illustrated in Example 
16.11 or if additional structure is imposed. If the scalar fields are independent, the 
following Lemma can be used to verify ( I3.10p . 



Lemma 3.1 Let X(s) have representation ( 12. 5ft with zero mean and E\\X(s) 
Assume further that and are independent if i ^ j. Then 

\E(X(8i) ® X(si) - C , X(s 2 ) ® X(s 2 ) - C)s\ 



< oo. 



< 



J>1 J>1 



PROOF: If and £.,•(•) are independent for i 7^ j, then the e 3 - are the eigenvalues of 
C, and the £j( s ) are the principal component scores with E£j(s) = Xj. Using continuity 
of the inner product and dominated convergence we obtain 

\E(X( Sl ) ® X(si) - C , X(s 2 ) <g> X(s 2 ) - C) s \ 

= E^^XfaleAXM-Cfa), (X(s 2 ),e,)X(s 2 )-C( ej ; 



J>1 



£>1 



k>l 



J>1 



^E 6(si)6-(8a) X>(si)6(s2) + A? - A,e|(si) - A^|(s 2 ; 



< 



< 



i>i j>i 



As already noted, for spatial processes assumptions on the distribution of the sampling 
points are as important as those on the covariance structure. To formalize the different 
sampling schemes introduced in Section HJ we propose the following measure of "minimal 
dispersion" of some point cloud &: 

I p (s, 6) = |{y E 6 : ||s - y|| 2 < p}\/\&\ and J p (6) = sup {I p (s, 6), s e 6} , 

where |@| denotes the number of elements of &. The quantity I p (&) is the maximal 
fraction of (3-points in a ball of radius p centered at an element of &. Notice that 
1 / 1 1 < I p (&) < 1. We call p 1 — y I p (&) the intensity function of &. 
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Definition 3.1 For a sampling scheme &n = { s i,N] 1 < i < Sn}, Sn —> oo, we consider 
the following conditions: 

(i) there is a p > such that limsupjv^,^ I p (&n) > 0; 

(ii) for some sequence p^ —> oo we have I Pn (&n) — >■ 0; 
(hi) for any fixed p > we have SnI p (&n) —> oo. 

We call a deterministic sampling scheme &n = {s^jv; 1 < i < SV} 

Type A if (i) holds; 

Type B if (ii) and (iii) hold; 

Type C if (ii) holds, but there is a p > such that lim sup^^^ S^I p {&n) < oo. 

If the sampling scheme is stochastic we call it Type A, B or C if relations (i), (ii) and (iii) 
hold with I p {G N ) replaced by EI p (& N ). 

Type A sampling is related to purely infill domain sampling which corresponds to 
I p (&n) = 1 for all N > 1, provided p is large enough. However, in contrast to the purely 
infill domain sampling, it still allows for a non-degenerate asymptotic theory for sparse 
enough subsamples (in the sense of Type B or C). 

Example 3.3 Assume that &n are sampling points on the line with S2k = 1/k and 
S2k+i = k, 1 < k < N. Then, for p = 1, liniAr^oo I p (& N ) = 1/2, so this sampling scheme 
is of Type A. But the subsample corresponding to odd indices is of Type C. 

A brief reflection shows that assumptions (i) and (ii) are mutually exclusive. Com- 
bining (ii) and (iii) implies that the points intensify (at least at certain spots) excluding 
the purely increasing domain sampling. Hence the Type B sampling corresponds to the 
nearly infill domain sampling. If only (ii) holds, but (iii) does not (Type C sampling) 
then the sampling scheme corresponds to purely increasing domain sampling. 

Our conditions are more general than those proposed so far. Their relation to more 
specific sampling designs previously used is discussed in Section HI 



4 Regular spatial designs 

We continue to assume a spatial design & N = {s fc A r, 1 < k < Sjy}. The two special 



cases we discuss are closely related to those considered by Lahiri (2003) , The points are 
assumed to be on a grid of an increasing size, or to have a density. The results of this 
section show how our more general assumptions look in these special cases, and provide 
additional intuition behind the sampling designs formulated in Definition 13.11 They also 
set a framework for some results of Sections [5] and [6j 
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4.1 Non-random regular design 

Let 2(6) be a lattice in M. d with increments Si in the i-th direction. Let So = min{<5i, . . . , Sj}, 
A d = Ylt=i $i ari d let i?Af = cunRo, where Rq is some bounded Riemann measurable Borel- 
set in M. d containing the origin. A set is Riemann measurable if its indicator function is 
Riemann integrable. This condition excludes highly irregular sets Rq. The scaling param- 
eters a 7v > are assumed to be non- decreasing and will be specified below in Lemma [4721 
We assume without loss of generality that Vol(i?o) = 1, hence Vo\(Rn) = oi d N . Typ- 
ical examples are Rq = {x G M. d : ||x|| < with z\ t d equal to the radius of the 
<i-dimensional sphere with volume 1, or Rq = [—1/2, l/2] d . The sampling points &n are 
defined as {s k)N , 1 < k < S^} = 2(r/ N 6) D Rn, where rj N is chosen such that the sample 
size S/v ~ N. It is intuitively clear that Vol(i2jv) ~ r qj s[ A d S^, suggesting 

(4.1) VN ° N 



AN 1 ^' 

A formal proof that tjn in (14.1 p assures Sn ~ N is immediate from the following 

Lemma 4.1 Let K be a bounded set in M. d , and assume that K is Riemann measurable 
with Vol(K) = 1. If f3 N -»■ 0, then 

\Kn2(p N 6)\~^. 



A d f3 



N 



PROOF: Let K C M\ C M 2 where Mi and M 2 are rectangles in IR d having no intersecting 
margin (M\ is an inner subset of M 2 ). The points {x^n} = 2(/3n6) D M 2 can be seen 
as the vertices of rectangles J^jv = x^n + {t ° (3n6, t G [0, l] d }, where o denotes the 
Hadamard (entrywise) product. For large enough N, the sets L ijN = J itN D Mi define a 
partition of M\. Then, by the assumed Riemann measurability, 

Ix(x)dx 

Mi 



= liminf /3^A d V w£{I K (x) : x e U N } 

i 

<limmf^ N A d y / I K (xi !N ) 

i 

<limsup/^A d V/^^jv) 

7V->oo 

i 

< limsup^A^ V"sup{/x(a;) : x G L i:N } 

N^OQ 

I 

Ix(x)dx. 



Mi 



1(3 



The following Lemma relates the non-random regular design to Definition 13.11 We 
write a 7v 3> 6jv if limsup 6/v /ajv < oo. 

Lemma 4.2 In t/ie above described design the following pairs of statements are equivalent: 

(i) ct ty remains bounded <^ Type 4. sampling; 

(ii) cun —> oo and a/v = o(N 1 ^ d ) Type i? sampling; 
(Hi) ajv 3> A^ 1 ^ Type C sampling. 

PROOF: Let U £ (x) be the sphere in M. d with center x and radius e. Assume first that 
&n — o(N 1 ^ d ), which covers (i) and (ii). In this case the volume of the rectangles Lj n as 
described in the proof of Lemma 14.11 satisfies 

(4.2) Vol(L, n ) = A d V d N = ^ -+ 0. 

Hence \U p {x) nZ^rj^S)] is asymptotically proportional to 

Vol(U p (x))/Vol(L i>n ) = V d (-?-) N, 

\a N J 

where Vd is the volume of the <i-dimensional unit sphere. Now if we fix an arbitrary p > 
then there are constants < Cl < Cu < oo, such that for any p > p and N > N and 

x eR d 

Cl (z\ d < < c 

L \oiNj ~ N U \a N ) 

By the required Riemann measurability we can find an x G Rq such that for some small 
enough e we have U-^ix) C Rq. Then U2ea N (atNx) C Rn- Hence for any 2p < p < £&n, 

9 V < \U P /2(aNx) ne N \ \U 2p (a N x)n& N \ f P Y 



c l( — ) < ' " v V < J, 6* < 1 ^ \ ^<Cui 

\ckn J N N \OtN J 

With the help of the above inequalities (i) and (ii) are easily checked. 

Now we prove (iii). We notice that by (14. 2 p o>n ^> N 1 ^ is equivalent to Vol(Lj n ) does 
not converge to 0. Assume first that we have Type C sampling. Then by the arguments 
above we find an x and a p > such that U p (a N x) C Rn- Thus 

\U p (a N x) n Z( m S)\ < S N I P (G N ). 

As this quantity remains bounded, Vol(Lj n ) does not converge to 0. 

On the other hand, if Vol(L i n ) does not converge to then for any p > and any 
x G M d we have limsup^^oo \U p (x) D Z(t}nS)\ < oo and thus for arbitrary large p 

T ,~ ^ \U P (x) n Z(n N 8)\ 

I p (6n) < sup 1 pK ' - KIN n -)• 0. 



> N 



The claim follows immediately. 
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4.2 Randomized design 



Let {sfc, 1 < k < N} be iid random vectors with a density f(s) which has support on a 
Borel set Ro C M. d containing the origin and satisfying Vol(-Ro) = 1- Again we assume 
Riemann measurability for Ro to exclude highly irregular sets. For the sake of simplicity 
we shall assume that on R the density is bounded away from zero, so that we have 
< /l < ini X £ Ro f(x). The point set {s fc A r, 1 < k < N} is defined by s kjN = a^S). for 
k = 1, . . . , N. For fixed N, this is equivalent to: {s^tv, 1 < k < N} is an iid sequence on 
Rn = &nRo with density a^ d f(a^ l s). 

We cannot expect to obtain a full analogue of Lemma 14.21 in the randomized setup. 
For Type C sampling, the problem is much more delicate, and a closer study shows 
that it is related to the oscillation behavior of multivariate empirical processes. While 



Stute (1984) gives almost sure upper bounds, we would need here sharp results on the 
moments of the modulus of continuity of multivariate empirical process. Such results 
exist, see Einmahl and Ruymgaart (1987), but are connected to technical assumptions on 
the bandwidth for the modulus (here determined by ccjv) which are not satisfied in our 
setup. Hence a detailed treatment would go beyond the scope of this paper. We thus 
state here the following lemma. 

Lemma 4.3 In the above described sampling scheme the following statements hold: 

(i) ccjv remains bounded =^> Type A sampling; 

(ii) ojtv - > oo and = o(N l l d ) Type B sampling; 



Proof. By Jensen's inequality we infer that 

1 N 

EI p (e N ) = E sup — V/{s fci7V e U p (x) n R N } 
> sup P(s 1;N e U p {x) n R N ) 

x£R N 

= sup P(sj G U p / aN {x) n Ro) 



= sup / f{s)ds. 

x&R Ju p/aN (x)nR 



We have two scenarios. First, oln remains bounded. Then we can choose p big enough 
such that Up/ aN (0) covers R for all N. It follows that limsup^^^ EI p (& N ) = 1 and (i) 
follows. 

Second, — > oo. Then for large enough N, Ro contains a ball with radius p/cin- It 
follows that 



(4.3) 



Ei p (e N )>f L v d (-?-) 
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Now statement (ii) follows easily. 



5 Consistency of the sample mean function 



Our goal is to establish the consistency of the sample mean for functional spatial data. 
We consider Type B or Type C sampling and obtain rates of convergence. We start with 
a general setup, and show that the rates can be improved in special cases. The general 
results are applied to functional random fields with specific covariance structures. The 
proofs of the main results, Propositions 15. 1[ I5.2f5.3[ are collected in Section [HJ 
For independent or weakly dependent functional observations X k , 



(5.1) 



E 



1 N 

-Yx 



k=l 



o (iv- 1 ) 



Proposition 15.11 shows that for general functional spatial processes, the rate of consistency 
may be much slower than O (A r_1 ); it is the maximum of H{pn) and I Pn (&n) with p^ 
from (ii) of Definition 13.11 Intuitively, the sample mean is consistent if there is a sequence 
of increasing balls which contain a fraction of points which tends to zero, and the decay 
of the correlations compensates for the increasing radius of these balls. 



Proposition 5.1 Let Assumption \3. 1\ hold, and assume that &n defines a non-random 
design of Type A, B or C. Then for any pn > 0, 



(5.2) 



E 



1 N 

— ^X(s KN ) - p 

k=l 



< h(p N ) + h(0)I PN (& N ). 



Hence, under the Type B or Type C non-random sampling, with p N as in (ii) of Defini- 
tion \3.1\ the sample mean is consistent. 

Example 5.1 Assume that N points {s^jv, 1 < k < N} are on a regular grid in 
ajy[— 1/2, l/2] d . Then, as we have seen in Section H~Tj I p (& N ) is proportional to {p/a^Y- 
For example, if h(x) = 1/(1 + x) 2 , then choosing p N = a^ d+2 ^ we obtain that 

h{p N ) + h(0)I PN (e N ) « a N 2d/{d+2) V AT 1 . 

(Recall that I PN {G N ) > N~\) 



A question that needs to be addressed is whether the bound obtained in Proposition l5.ll 
is optimal. It is not surprising that (15. 2\i will not be uniformly optimal. This is because 
the assumptions in Proposition 15.11 are too general to give a precise rate for all the cases 
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covered. For instance, a smaller bound for Example 15.11 is obtained using Proposition 15.21 
below. In some sense, however, the rate ( 15.21) is optimal, as it possible to construct 
examples which attain the bound (15. 2p . 

Example 5.2 Let X(s; t) = ^(s)e(t), with s e R, t e [0, 1], ^ e 2 (t)dt = 1 and 

^(s) = J2 J {se(U + k,U + k + 

fcez 

where is an iid sequence with 5\ = ±1, each with probability 1/2 and U is uniformly 
distributed on [0, 1] and independent of A simple calculation shows that EX(s; t) = 
for all s,t and that E(X(u),X(v)} = (1 - 

fcftjv 



6 



N 



N 



v\)I{\u-v\ < 1}. Let 
1 < k < N 



This sampling scheme is of Type A, B or C, depending on whether remains bounded, 
ctjv — > oo, ttiv = o(N) and = 0(on), respectively. In the latter case let us assume 
for the sake of simplicity that a.N = N. Using the explicit formula for E{X{u),X{y)) we 
obtain 



E 



1 N 

-yx\ 

N ^ 



Sk 



k=l 



N N 



EE 



N 2 



k=l l=\ 

N/a N 

E 



1 N J 



i\ \k-e\< 



N 



2 

iV2 



h=l 



_ ha N \ 
N ) 



(N-h) 



N- 1 
i 



iV 



2/ mm{ ^' 1} (l-«|x|)(l 

For Type B and Type C sampling, the optimal bound using Proposition 15.11 is obtained 
setting = 1, in which case we have that the r.h.s. in (I5.2p is Ii(&n) — if a N — N 
1 if ftTv = o(N). Under Type A sampling the r.h.s. in (15.21) remains 



if ftAr = N; 

if ft^ — > oo, ft^y = o(N)] 
x\)dx if ftAr — > ft. 



N 



a 



and h(& 

bounded away from zero and the same holds true for the exact quadratic loss. 

We now consider the special case, where we have a regular sampling design. Here we 
are able to obtain the strongest results. 

Proposition 5.2 Assume the sampling design of Section J^.l, Let Assumption \3. 1\ hold 
with h such that x d ~ 1 h(x) is monotone on [b, oo) , b > 0. Then under Type B sampling 



(5.3) 




d-l 



h(x)dx+o(l) sup 

xd[0,Kaisi 



d-l 



h{x) 
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for some large enough constant K which is independent of N. Under Type C sampling 
1/aff in (I5.3P is replaced by 0(iV _1 ). 

The technical assumptions on h pose no practical problem, they are satisfied for all 
important examples, see Example 13.21 A common situation is that x d ~ 1 h(x) is increasing 
on [0, b] and decreasing thereafter. 

Our first example shows that for most typical covariance functions, under nearly infill 
domain sampling, the rate of consistency may be much slower than for the iid case, if the 
size of the domain does not increase fast enough. 

Example 5.3 Suppose the functional spatial process has representation (I2.5p . and (13. 7p 
holds with with the covariance functions <fij as in Example 13.21 (powered exponential, 
Matern or spherical). Define h(x) = J2j>i4>j( x )i an d assume that condition (13. 9ft holds. 
Assumption 13.11 is then satisfied and 

POO 

(5.4) / x d ~ 1 h(x)dx < oo and sup x d ~ 1 h{x) < oo. 

Jo zeR 

Therefore, for the sampling design of Section 14.11 



(5.5) E 



1 y^w g \ 2 f 0(a N d \/N x ) , under Type B sampling 
Sn ~^ ' 1 O (N" 1 ) , under Type C sampling 



The next example shows that formula (15.51) is far from universal, and that the rate of 
consistency may be even slower if the covariances decay slower than exponential. 




Example 5.4 Consider the general setting of Example 15.31 but assume that each covari- 
ance function (pj has the quadratic rational form 

<j>j{x) 

Condition (13 .9ft implies that h(x) = Ylj>i < / , j( x ) satisfies Assumption 13. 11 but now h(x) ~ 
x~ 2 , asi-f oo. Because of this rate, condition (15 .4p holds only for d = 1 (and so for this 
dimension (15.51) also holds). If d > 2, (15 .4p fails, and to find the rate of the consistency, 
we must use (15.31) directly. We focus only on Type B sampling, and assume implicitly 
that the rate is slower than A^ -1 . We assume ( 13. 9[) throughout this example. 
If d = 2, 

x d ~ 1 h(x)dx = 




0(ln a N ) 
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and similarly snp x ^ KctN ^ x ' = 0(1). 

If d > 3, the leading term is 



o 



x d - 1 h{x)dx = (a d N 3 ) 
We summarize these calculations as 



E 



(Sk,N 



N 



fc=l 



O (c^ 1 ) , if d = 1 

O (ajj 2 ln(o!jv)) , if d = 2 
O (o^ 2 ) , if d > 3, 



for Type B sampling scheme (provided the rate is slower than N x ). 

The last example shows that for very persistent spatial dependence, the rate of con- 
sistency can be essentially arbitrarily slow. 

Example 5.5 Assume that h(x) decays only at a logarithmic rate, h(x) = {log(x V e)} _1 . 
Then, for any d > 1, the left hand side in (15. 3p is (loga^) -1 . 

We now turn to the case of the random design. 



Proposition 5.3 Assume the random sampling design of Section \4-S\ If the sequence 
{sfc} is independent of the process X, and if Assumption [X71 holds, then we have for any 
e N > 

AT r> 

h(0) 



E 



1 N 



k=l 



< V(e N ) sup / (s) + h(a N e N ) + 

se-Ro iV 



where 



(5.6) 



V(e N ) = Vol{(s,r) G R 2 : ||s-r|| 2 < £jv }- 



Choosing such that — > and oa^a? oo, follows that under Type B or Type C 
sampling, the sample mean is consistent. 



The bound in Proposition 15.31 can be easily applied to any specific random sampling 
design and any model for the functions <fij in (12. 5p . It nicely shows that what matters for 
the rate of consistency is the interplay between between the rate of growth of the sampling 
domain and the rate of decay of dependence. For typical sets R , V(en) is proportional 
to En- Taking En = N' 1 , we see that the rate of consistency is h(atf/N) V iV _1 . For 
typical covariance functions (f)j, like powered exponential, Matern or spherical, hfaw/N) 
decays faster than iV _1 , provided increases faster than N. In such cases, the rate 
of consistency is the same as for an iid sample. For ease of reference, we formulate the 
following corollary, which can be used in practical applications. 
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Corollary 5.1 Assume the random sampling design of Section UTB with the sequence 
{sfc} independent the process X . Suppose (12. 5p and (13.71) hold with the <pj in one of the 
families specified in Example \3.2 . If Condition (13.91) holds, and > aN In N, for some 
a > 0, then ( 15. ip holds. 



6 Consistency of the empirical covariance operator 

In Section [5] we found the rates of consistency for the functional sample mean. We now 
turn to the rates for the sample covariance operator. Assuming the functional observa- 
tions have mean zero, the natural estimator of the covariance operator C is the sample 
covariance operator given by 

1 N 

k=l 

In general, the sample covariance operator is defined by 

N 



f n = ^ Yl - Xn ) ® - Xn ) 



N 

k=i 

where 

N 



Xn ~ jj y^^( sfc )- 

k=l 

Both operators are implemented in statistical software packages, for example in the pop- 



ular R package FDA and in a similar MATLAB package, see Ramsay et al. (2009) , The 
operator T N is used to compute the EFPC's for centered data, while Cn for data without 
centering. 

We first derive the rates of consistency for Cn assuming EX(s) = 0. Then we turn 
to the operator Tn- The proofs are obtained by applying the technique developed for 
the estimation of the functional mean. It is a general approach based on the estimation 
of the second moments of an appropriate norm (between estimator and estimand) so 
that the conditions in Definition 13.11 can come into play. It is broadly applicable to all 
statistics obtained by simple averaging of quantities defined at single spatial location. 
The proofs are thus similar to those presented in the simplest case in Section [HI but the 
notation becomes more cumbersome because of the increased complexity of the objects 
to be averaged. To conserve space these proofs are not included. 

We begin by observing that 



-EIICa? — C[L = (Cn — C , Cn — C) 



is 



N N 

— J2(X(s k ) ® X(s k ) - C , X(s e ) ® X(s e ) - C) s . 

k=l 1=1 
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It follows that under Assumption 13.21 



N N 



(6.i; 



E\\C N -C\\ 2 s <^- 2 J2Y,H(\\s k - Si \\ 2 ). 



k=l 1=1 



Relation (16. ip is used as the starting point of all proofs, cf. the proof of Proposition 15.11 
in Section [5j Modifying the proofs of Section |5j we arrive at the following results. 

Proposition 6.1 Let Assumption ^. %\ hold, and assume that &n defines a non-random 
design of Type A, B or C. Then for any > 

E\\C N - C\\ 2 S < H(p N ) + H(0)I PN (e N ). 

Hence under the Type B or Type C non-random sampling, with as in (ii) of Defini- 
tion \3.1\ the empirical covariance operator is consistent. 

Proposition 6.2 Assume the sampling design of Section \4-l\ Let Assumption s.^ hold, 
with some function H such that x d ~ 1 H(x) is monotone on [b, 00), b > 0. Then under 
Type B sampling 



a 



N 



X 



d-1 



H(x)dx + o(l) sup x d ~ l H{x) 



x£[0,Ka N ] 



for some large enough constant K which is independent of N . Under Type C sampling, 
the factor l/afj is replaced by O^N" 1 ) . 

Proposition 6.3 Assume the random sampling design of Section \4-S\ If the sequence 
{sfc} is independent the process X and if Assumption Iff.ffl holds, then we have for any 
e N > 0, 



E 



1 N 

-Yx\ 

N ^ 

k=l 



Sk,N) - H 



< V(e N ) sup / 2 (s) + H(a N e N ) 

sG-Ro 



H(0) 
N ' 



with V(en) given by (15. 6p . 

It follows that under Type B or Type C sampling the sample covariance operator is 
consistent. 

Example 6.1 Let X have representation (12.51) . in which the scalar fields £•,(■) are inde- 
pendent and Gaussian, and (12321) (12331) and (1231) hold. 
It follows that for some large enough constant A, 

E Cov (^( s i)4( s 2)) + E E fe( s ^) 12 

j>i i>i 
< Aexp ( - 2p~ 1 ||si - s 2 || 2 ). 
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Hence by Lemma 13.14 Assumption 13.21 holds with H(x) = Aexp ( — 2p 1 ||s 1 — s 2 || 2 ). 
Proposition 16 . 1 1 yields consistency of the estimator under Type B or Type C sampling, as 

E\\C N - C\\% < A(exp(-2p-Vjv) + IpASn] 
If we assume a regular sampling design, then by Proposition 16.21 

B ||c„-c|H<a(-1 + I). 

Introducing the (unobservable) operator 

1 N 



k=l 



we see that 
Therefore 



r N -t N = {x N -n)® (x N - n). 



E\\f N - Cf s < 2E\\t N - Cf s + 2E\\(X N - y) ® (X N - y)\\ 2 . 

The bounds in Propositions 16.11 16.21 and [6731 apply to E\\Tn — C|||- Observe that 

E\\(X N - y) ® (X N - y)\\ 2 s = E\\X N - /i|| 4 . 

If X(s) are bounded variables, i.e. sup£ g r 01 i |X(s;t)| < B < oo a.s., then \\Xn — /i|| 4 < 
AB 2 \\Xn — fJ>\\ 2 - It follows that under Assumption 13. 1 1 we obtain the same order of magni- 
tude for the bounds of E\\X N — /i|| 4 as we have obtained in Propositions 15.11 15.21 and [5731 
for E\\X N — /i|| 2 . In general E\\X N — /i|| 4 can neither be bounded in terms of E\\X N — fi\\ 2 
nor with E\\Cn — C|||. To bound fourth order moments, conditions on the covariance 
between the variables Z k ^ := (X(s k N ) — /i , X(s ijN ) — jj) and Z^j for all 1 < i,j,k,£ < N 
are unavoidable. However, a simpler general approach is to require higher order moments 
of ||A(s)||. More precisely, we notice that for any p > 1, by the Holder inequality, 

l/v / ,, - .. 4p-2 \ 



E\\X N - y\\ 4 < (E\\X N - n\\ 2 ) l/p [E\\X N - fi||£r 



4p-2 



Thus as long as -E||X(s)|| p- 1 < oo, we conclude that, by stationarity, 

E\\X N -fi\\ 4 < M{p) {E\\X N - fi\\ 2 ) 1/p , 

where M(p) depends on the distribution of X(s) and on p, but not on N. It is now evident 
how the results of Section \5\ can be used to obtain bounds for E\\T n — C|||- We state in 
Proposition 16.41 the version for the general non-random design. The special cases follow, 
and the random designs are treated analogously. It follows that if Assumptions 13. Il and l3.2l 
hold, then E'llf'jv — C||| — > 0, under Type B or C sampling, provided i?||X(s) \\ 4+s < oo. 



25 



Proposition 6.4 Let Assumptions ^ H and \3.2\ hold and assume that for some 5 > we 
have -E||X(s)|| 4+5 < oo. Assume further that &n defines a non-random design of Type 
A, B or C. Then for any p N > we have 

(6.2) E\\t N - C\\ 2 S < 2 {H(p N ) + H(0)I PN (e N )} + 2C(S) {h(p N ) + h(0)I p „(& N )}™ . 

If X(si) is a.s. bounded by some finite constant B, then we can formally let 5 in (16. 2p go 
to oo, with C(oo) = AB 2 . 



7 Inconsistent empirical functional principal compo- 
nents 

We begin by formalizing the intuition behind Example 12.21 By Lemma 12.11 the claims in 
that example follow from Proposition 17.11 Recall that X* = X(0) <g> X(0), and observe 
that for x G L 2 , 



X*(x)(t) = X(0;u)x(u)duj X(0;t) = J c*(t,u)x(u)du, 

where 

c*{t,u) = X(0;t)X(0;u). 

Since 

E J J (c*(t,u)) 2 dtdu = £||X(0)|| 4 < oo, 
the operator X* is Hilbert-Schmidt almost surely. 

Proposition 7. 1 Suppose representation (12.41) holds with stationary mean zero Gaussian 
processes £j such that 

^•(s)^(s + h)] =X jPj (h), h=\\h\\, 

where each pj is a continuous correlation function, and Aj < oo. Assume the processes 
£j and £j are independent if i ^ j . If &n = {s 1; s 2 , . . . , s n } C M d with s n — > 0, then 

(7.1) lim E\\C N -X*\\l = 0. 

TV— >oo 

Proposition 17.11 is proven in Section [BJ 

We now present a very specific example that illustrates Proposition 17.11 
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FIGURE 7.1 Ten simulated EFPC's v\ for process ([772]) with A = 0.5 and ei(t) = 
V / 2sin(2vrt), e 2 (t) = v^cos^vrf) (N = 100). 




Example 7.1 Suppose 

(7.2) X{s; t) = Ci(s) ei (t) + v / AC 2 (s)e 2 (t), 

where the (j and £ 2 are iid processes on the line, and < A < 1. Assume that the 
processes Ci and ( 2 are Gaussian with mean zero and covariances E[Q(s)Q(s + h)} = 
exp{— h 2 }, j = 1,2. Thus, each Zj := Q(0) is standard normal. Rearranging the terms, 
we obtain 

X*(x) = (z\ (x, ei) + \f\Z t Z 2 (x, e 2 >) e x + (y/XZ^ (x, e x ) + AZ 2 (x, e 2 )) e 2 . 
The matrix 

Z 2 y/\Z 1 Z 2 

_ V\z x z 2 xzl 

has only one positive eigenvalue Z\ + \Z\ = ||X(0)|| 2 . A normalized eigenfunction asso- 
ciated with it is 

(7.3) / := ||||||| = [Zl + XZl] " 1/2 (Z iei + v^Z 2 e 2 ) . 

Denote by t)i a normalized eigenfunction corresponding to the largest eigenvalue of Cjy- By 
Lemma [2711 {*i is close in probability to sign((0i, /))/. It is thus not close to sign((t>i, e\f)e\. 

Ten simulated v x , with e\(t) = \/2 sin(27rt), e 2 (t) = \/2 cos(27rt), A = 0.5, are shown 
in Figure 17711 The EFPC V\ is a linear combination of e± and e 2 with random weights. As 
formula (I7.3P suggests, the function e\ is likely to receive a larger weight. The weights, 
and so the simulated V\, cluster because both Z\ and Z 2 are standard normal. 
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We now state a general result showing that Type A sampling generally leads to incon- 
sistent estimators if the spatial dependence does not vanish. 

Proposition 7.2 Assume that E(X(s 1 ) -p , X(s 2 ) — p) > b(\\si — s 2 || 2 ) > 0, where b(x) 
is non-increasing. Then under Type A sampling the sample mean Xn is not a consistent 
estimator of p,. Similarly, if EX(s) = and 

(7.4) E{X{ Sl ) ® X( Sl ) - C , X(s 2 ) ® X(s 2 ) - C) s > B{\\ Sl - s 2 || 2 ) > 0, 

where B(x) is non-increasing, then under Type A sampling the sample covariance Cn is 
not a consistent estimator of C. 

We illustrate Proposition 17.21 with an example that complements Example 12.21 and 
Proposition 17.11 in a sense that in Proposition 17.11 the functional model was complex, but 
the spatial distribution of the simple. In Example 17.21 we allow a general Type A 
distribution, but consider the simple model ( 17. 2ft . 

Example 7.2 We focus on condition ( I7.4p for the FPC's. For the general model ( 12 .40 . 
the left-hand side of ( I7.4p is equal to 

«(si,s 2 ) = Cov(^(si)^(s 1 ),^(s 2 )^(s 2 )). 

t,j>i 

If the processes £j satisfy the assumptions of Proposition 17.11 then, by Lemma 12.21 

Cov(&( Sl )^(si),&(s 2 )^(s 2 )) = Ai 2 r, + \) rj + XA, r -^^ - (\l ,2 n + Af r,) v^+A^, 

where r { = Pi(||si - s 2 ||). 

To calculate «(si,s 2 ) in a simple case, corresponding to (17. 2p . suppose 

(7.5) Ai = 1, A 2 = A, < A < 1, Ai = 0, % > 2, and pi = p 2 = p. 
Then, 

«(si,s 2 ) = /(A)p(||si - s 2 ||), 

where 

/(A) = (3 - 2V2)(1 + A 2 ) + 2 [1 + A + A 2 - (1 + A 3/2 )(l + A) 1/2 ] . 

The function / increases from about 0.17 at A = to about 0.69 at A = 1. 

We have verified that if the functional random field (12.41) satisfies the assumptions 
of Proposition 17.11 and (I7.5p . then C is an inconsistent estimator of C under Type A 
sampling, whenever p(h) is a nonincreasing function of h. 
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E 



8 Proofs of the results of Sections O, M and [7] 

Proof of Proposition 15.11 By Assumption 13.11 we have 
1 - 

— J2 X ( S k,N) ~P 

k=l 

^ N N 

= S ( X ( S fc,Jv) - , ^Kiv) - /i) 

fe=l ^=1 
j JV iV 

fc=l £=1 

j JV JV 

- jya ZZ ( h (pN)I{\\sk,N - se,Nh > Pn} + h(0)I{\\s k ,N ~ s^jvlh < Pn}) 



fe=l £=1 
< %jv) + /i(0)I pjv (6jv) 



The following Lemma is a simple calculus problem and will be used in the proof of 
Proposition 15.21 

LEMMA 8.1 Assume that f is a non-negative function which is monotone on [0,6] and 
on [b, oo). Then 

Jl / k \ i r L / N o 

^ \Jy J Jy Jo Jy xe[o,L/N] 

Proof of Proposition 15.21 By Assumption [37E1 



s k,N) — P 



Sn Sn 



a N 1,-1 <_i 



Sat Sjv 



^ Q2 J] J]^(l|Sfc 1 JV-S^ i v||2). 

^ fc=1 £=1 



Let a = (ai, . . . , ad) and b = (&i, . . . , bo) be two elements on Z{8). We define d(a, b) = 
mmi<i<d v j(a, b), where fj(a, b) is the number of edges between aj and 6j. For any two 
points Sfc jv and jy we have 



U) 



d(sk,N, S£,jv) = m from some m e {0, . . . , i^A^ 1 ^}, 



where A' depends on diam(i? ). It is easy to see that the number of points on the grid 
having distance m from a given point is less than 2<i(2m + l) d , m > 0. Hence the number 
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of pairs for which ( 18. ip holds is < 2d(2m+l) d ~ 1 N . On the other hand, if d(s^ t ^, s^jv) = tu, 
then ||s fc A r — S£ jv|| 2 > m5or] N . Let us assume without loss of generality that 5 = 1. Noting 
that there is no loss of generality if we assume that x s ~ 1 h(x) is also monotone on [0,6], 
we obtain by Lemma 18.11 for large enough N and K < K' < K" 



^ Sn Sn 

£2" ^ y^h(W 8 *>,N - s e,nh) 



N k=l 1=1 

111 = 1 



(2m+l) d - 1 1 , \ 2h(0) 
h(mri N ) H — 



m=0 
-1 / /•if"ZV 1 /d-l 



/ q n d-1 K'N 1 / d +l , \ d—1 / \ 1 r>k/TA 



<2d(— 
+ |- sup + 

x l h[x)dx 



(Nr] N x) d ~ 1 h(Nr] N x)dx 
2h(0) 



+ 



i 

A 

4d(3A) 



tt 7V JO 



Q: 



d-1 
N 



N 1 ^ x G fO,^"a 



sup x + 



G[0,/<"aAr/A] 



A/ 



By Lemma H~2l Type B sampling implies — > oo and o>n = o (jV 1 /*). This shows Q . 
Under Type C sampling <C 1/-/V. The proof is finished. 



proof of Proposition ( |5\3l . This time we have 



E 



1 * 



s k,N) — ^ 



k=l 



N N 



^ Yl E ( X ( S k,N) - V , X{s £tN ) - fl) 



N 2 



k=i i=i 



N N 



k=i i=i 

<ct N 2d / h(\\s - r\\ 2 )f(a~ N 1 s)f(aJ f 1 r)dsdr + 

h(0) 



Ro JRo 



h(a N \\s - r|| 2 )/(s)/(r) dsdr 



N 



N 



30 



Furthermore, for any > 0, 

/ / h(a N \\s-r\\ 2 )f(s)f(r)dsdr 

<h(0) f [ f(s)f( r )l{\\s-r\\ 2 <e N }dsdv + h(a N e N ) 

jRo JRo 

< sup / 2 (s) x Vol{(s, r) G -Rq : ||s — r|| 2 < £/v} + h(aN£N)- 

sG-Ro 

Proof of Proposition 17.11 Observe that 

{N ~\ 2 

i £ [X(s n ; t)X(s n ; u) - X(0; t)X(0; u)} j dtdu. 

Therefore, 

\\C N -X*\\% <2h(N)+2I 2 (N), 

where 



// |^S X ( s «;0(^(Sn;«) -X(0;w))| dtdu 

JJ ^Yl x ^ u ^ x ^ t ^- X ^ t ^ dtdu - 



h(N) 

and 

We will show that EIi(N) — >■ 0. The argument for I 2 (N) is the same. Observe that 



AT 

/ i( iV ) = ^Zl // ^(s fc ;t)(X(s fc ;w)-A'(0;u))X(s/;t)(A'(s / ;«)-A'(0;«))dttZ« 



i N r r 

= Jpl2 ^(s fc ; t)X(s e ; t)dt / (X(s k ; u) - X(0; u))(X(s e ; u) - X(0; u))du. 

k,l=l J J 

Thus, 

TV ( 2 \ l/ 2 ( 2 \ V 2 

Eh(N) < A ^ |# ^ X(s k ;t)X(s e ;t)dtj | |# ^ y fc (u)Y<(u)du 
where 

y fc (u) = X(s fc ;«)-X(0;u). 
We first deal with the integration over t: 



E[ I X(s k ;t)X(sf,t)dt) <E I X 2 {s k ;t)dt / X 2 {s e ;t)dt 
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= E [||X(s fc )|| 2 ||X(s,)|| 2 ] < {£||X(s fe )|| 4 } 1/2 {£||X(s,)|| 4 } 1/2 = E\\X(OW 
We thus see that 

N ( / „ \2\ 1 / 2 



Eh(N) < {£||X(0)|| 4 } 1/2 ±£\e([ Y k (u)Y t (u)d 

< {£||x(o)|| 4 } 1/2 ± £ l E (/ > ^ ( / 



N ( , , , 2) ^ f / f ,2^/4 

2/ 



{£||X(0)|| 4 } 



41 1/2 



If/, v2l V4" 1 2 



Consequently, to complete the verification of (I7.ip . it suffices to show that 

at r / , n 2\ V4 



iV-i>oo 

The above relation will follow from 

(8.2) Um E (^J Y^{u)dv^j = 0. 

To verify (|8.2p . first notice that, by the orthonormality of the ej, 

/oo 

Therefore, by the independence of the processes £j, 

e (| n 2 H^y = £sfe(8*) -&(o)) 4 

+ E (^(s*) - &(0)) 2 J5 (0(s fc ) - £,(0)) 2 . 
The covariance structure was specified so that 



S(0(s fe )-e j (0)) 2 = 2A i (l-p i (||s fe ||)), 



so the normality yields 



E (^J Y^(u)duj <12fjA 2 (l-p,(||s fc 



.7 = 

{oo ^ 2 

-paw 
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The right hand side tends to zero by the Dominated Convergence Theorem. This estab- 
lishes (]8.2p . and completes the proof of (17. ip . 



Proof of Proposition 17.21 We only check inconsistency of the sample mean. In view 
of the proof of Proposition 15.11 we have now the lower bound 



E 



^ N 2 ^ N N 



k=l 



k=l i=\ 



> b( P )I 2 (& 



A' I 



which is by assumption bounded away from zero for N — > oo. 
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